Goto

Collaborating Authors

 implement gradient descent


Provable optimal transport with transformers: The essence of depth and prompt engineering

arXiv.org Machine Learning

Can we establish provable performance guarantees for transformers? Establishing such theoretical guarantees is a milestone in developing trustworthy generative AI. In this paper, we take a step toward addressing this question by focusing on optimal transport, a fundamental problem at the intersection of combinatorial and continuous optimization. Leveraging the computational power of attention layers, we prove that a transformer with fixed parameters can effectively solve the optimal transport problem in Wasserstein-2 with entropic regularization for an arbitrary number of points. Consequently, the transformer can sort lists of arbitrary sizes up to an approximation factor. Our results rely on an engineered prompt that enables the transformer to implement gradient descent with adaptive stepsizes on the dual optimal transport. Combining the convergence analysis of gradient descent with Sinkhorn dynamics, we establish an explicit approximation bound for optimal transport with transformers, which improves as depth increases. Our findings provide novel insights into the essence of prompt engineering and depth for solving optimal transport. In particular, prompt engineering boosts the algorithmic expressivity of transformers, allowing them implement an optimization method. With increasing depth, transformers can simulate several iterations of gradient descent.


How to implement Gradient Descent in Python

#artificialintelligence

We will try to build a single neuron network, which can predict the admissions of a graduate school. The data we will use is shared above in google drive. The first 5 rows of data are shown below. The first column admit indicates whether the student is getting admitted to the school or not, this will be the target for our model; the second column gre and the third column gpa are numerical features for the student; the fourth column rank is a categorical feature. We will apply one-hot encoding to the categorical feature to add dummy columns.


Machine Learning for Humans, Part 2.1: Supervised Learning

#artificialintelligence

The goal of gradient descent is to find the minimum of our model's loss function by iteratively getting a better and better approximation of it. Imagine yourself walking through a valley with a blindfold on. Your goal is to find the bottom of the valley. How would you do it? A reasonable approach would be to touch the ground around you and move in whichever direction the ground is sloping down most steeply.